Gathering Metadata from Web-Based Repositories of Historical Publications
نویسندگان
چکیده
In this paper we examine the problem of extracting schema-conforming metadata out from HTML sources. A technique founded on semistructured data analysis is explained. It is based on the combination of HTML styles, which abstract the visual characteristics of documents, and document-oriented context-free grammar, which provide structural information. This technique is flexible enough to be applied not only on individual HTML docuements, but also on hyperlinked web structures. This provides an informed, very controlled way of navigating the repositories.
منابع مشابه
شناسایی روابط کتابشناختی در فهرست کتابخانه ملی ایران مبتنی بر الگوی ملزومات کارکردی پیشینههای کتابشناختی (اف آر بی آر): گام نخست در بازنمون شبکه دانش انتشارات ایرانی-اسلامی
The aim of this study is to find out the bibliographic relationships between the metadata records in the National Library and Archives of Iran (NLAI) according to FRBR model, in order to represent the Knowledge network of Iranian-Islamic publications. To achieve this objective, the content analysis method was used. The study population includes metadata records for books in NLAI for four biblio...
متن کاملMetadata for Adaptive and Distributed Learning Repositories
Web-based learning is one of the important applications of the World Wide Web, which makes possible location-and time-independent learning scenarios. Content can also be kept up-to-date, discussions and interactions between instructors and learners can be supported, new materials can easily be distributed to the students. We will discuss in this talk three aspects of web-based learning material...
متن کاملبررسی واکنش موتورهای کاوش وب به پیشینههای فرادادهای مبتنی برروش ترکیبی دادههای خرد و روش دادههای پیوندی
The purpose of this research was to find out the reaction of Web Search Engines to Metadata records created based on the combined method of Rich Snippets and Linked Data. 200 metadata records in two groups (100 records as the control group with the normal structure and, 100 records created based on microdata and implemented in RDF/XML as experimental group) extracted from the information gatewa...
متن کاملModeling, Exploring and Recommending Music in Its Complexity
Knowledge models that are currently in-use for describing music metadata are insufficient to express the wealth of complex information about creative works, expressions, performances, publications, authors and performers. In this research, we aim to propose a method for structuring the classical music information coming from different heterogeneous librarian repositories. In particular, we rese...
متن کاملManaging metadata in open learning repositories and P2P networks
“Now, miraculously, we have the Web. For the documents in our lives, everything is simple and smooth. But for data, we are still pre-Web.”(Tim Berners-Lee, Business Model for the Semantic Web) The successful use and re-use, search, and operation of data, depends on the effective definition, use and management of metadata. The first part of this thesis considers the issues related to learning me...
متن کامل